Overview

Dataset statistics

Number of variables16
Number of observations4,238
Missing cells645
Missing cells (%)1.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory529.9 KiB
Average record size in memory128.0 B

Variable types

Categorical8
Numeric8

Alerts

currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
glucose is highly correlated with diabetesHigh correlation
education has 105 (2.5%) missing values Missing
BPMeds has 53 (1.3%) missing values Missing
totChol has 50 (1.2%) missing values Missing
glucose has 388 (9.2%) missing values Missing
cigsPerDay has 2144 (50.6%) zeros Zeros

Reproduction

Analysis started2022-10-12 02:48:17.357836
Analysis finished2022-10-12 02:48:23.751993
Duration6.39 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

male
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size240.2 KiB
0
2419 
1
1819 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4,238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Length

2022-10-11T22:48:23.801569image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:23.883439image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring characters

ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

age
Real number (ℝ≥0)

Distinct39
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.58494573
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:23.951617image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.572159925
Coefficient of variation (CV)0.1728782758
Kurtosis-0.9896358464
Mean49.58494573
Median Absolute Deviation (MAD)7
Skewness0.2281457773
Sum210141
Variance73.48192578
MonotonicityNot monotonic
2022-10-11T22:48:24.021541image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
40191
 
4.5%
46182
 
4.3%
42180
 
4.2%
41174
 
4.1%
48173
 
4.1%
39169
 
4.0%
44166
 
3.9%
45162
 
3.8%
43159
 
3.8%
52149
 
3.5%
Other values (29)2533
59.8%
ValueCountFrequency (%)
321
 
< 0.1%
335
 
0.1%
3418
 
0.4%
3542
 
1.0%
3684
2.0%
3792
2.2%
38144
3.4%
39169
4.0%
40191
4.5%
41174
4.1%
ValueCountFrequency (%)
702
 
< 0.1%
697
 
0.2%
6818
 
0.4%
6745
1.1%
6638
 
0.9%
6557
1.3%
6493
2.2%
63110
2.6%
6299
2.3%
61110
2.6%

education
Categorical

MISSING

Distinct4
Distinct (%)0.1%
Missing105
Missing (%)2.5%
Memory size246.4 KiB
1.0
1720 
2.0
1253 
3.0
687 
4.0
473 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12,399
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row2.0
3rd row1.0
4th row3.0
5th row3.0

Common Values

ValueCountFrequency (%)
1.01720
40.6%
2.01253
29.6%
3.0687
 
16.2%
4.0473
 
11.2%
(Missing)105
 
2.5%

Length

2022-10-11T22:48:24.084383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:24.147582image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1.01720
41.6%
2.01253
30.3%
3.0687
 
16.6%
4.0473
 
11.4%

Most occurring characters

ValueCountFrequency (%)
.4133
33.3%
04133
33.3%
11720
13.9%
21253
 
10.1%
3687
 
5.5%
4473
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8266
66.7%
Other Punctuation4133
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04133
50.0%
11720
20.8%
21253
 
15.2%
3687
 
8.3%
4473
 
5.7%
Other Punctuation
ValueCountFrequency (%)
.4133
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12399
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.4133
33.3%
04133
33.3%
11720
13.9%
21253
 
10.1%
3687
 
5.5%
4473
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII12399
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.4133
33.3%
04133
33.3%
11720
13.9%
21253
 
10.1%
3687
 
5.5%
4473
 
3.8%

currentSmoker
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size240.2 KiB
0
2144 
1
2094 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4,238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Length

2022-10-11T22:48:24.202570image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:24.261477image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring characters

ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

cigsPerDay
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct33
Distinct (%)0.8%
Missing29
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean9.00308862
Minimum0
Maximum70
Zeros2144
Zeros (%)50.6%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:24.315928image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.92009359
Coefficient of variation (CV)1.324000473
Kurtosis1.023355805
Mean9.00308862
Median Absolute Deviation (MAD)0
Skewness1.247909903
Sum37894
Variance142.0886311
MonotonicityNot monotonic
2022-10-11T22:48:24.379729image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
02144
50.6%
20734
 
17.3%
30217
 
5.1%
15210
 
5.0%
10143
 
3.4%
9130
 
3.1%
5121
 
2.9%
3100
 
2.4%
4080
 
1.9%
167
 
1.6%
Other values (23)263
 
6.2%
ValueCountFrequency (%)
02144
50.6%
167
 
1.6%
218
 
0.4%
3100
 
2.4%
49
 
0.2%
5121
 
2.9%
618
 
0.4%
712
 
0.3%
811
 
0.3%
9130
 
3.1%
ValueCountFrequency (%)
701
 
< 0.1%
6011
 
0.3%
506
 
0.1%
453
 
0.1%
4356
 
1.3%
4080
 
1.9%
381
 
< 0.1%
3522
 
0.5%
30217
5.1%
291
 
< 0.1%

BPMeds
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing53
Missing (%)1.3%
Memory size247.4 KiB
0.0
4061 
1.0
 
124

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12,555
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04061
95.8%
1.0124
 
2.9%
(Missing)53
 
1.3%

Length

2022-10-11T22:48:24.439983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:24.498354image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0.04061
97.0%
1.0124
 
3.0%

Most occurring characters

ValueCountFrequency (%)
08246
65.7%
.4185
33.3%
1124
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8370
66.7%
Other Punctuation4185
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08246
98.5%
1124
 
1.5%
Other Punctuation
ValueCountFrequency (%)
.4185
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12555
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08246
65.7%
.4185
33.3%
1124
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12555
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08246
65.7%
.4185
33.3%
1124
 
1.0%

prevalentStroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size240.2 KiB
0
4213 
1
 
25

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4,238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Length

2022-10-11T22:48:24.549120image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:24.607469image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring characters

ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

prevalentHyp
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size240.2 KiB
0
2922 
1
1316 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4,238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Length

2022-10-11T22:48:24.658357image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:24.717388image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring characters

ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

diabetes
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size240.2 KiB
0
4129 
1
 
109

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4,238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Length

2022-10-11T22:48:24.768818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:24.827165image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring characters

ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

totChol
Real number (ℝ≥0)

MISSING

Distinct248
Distinct (%)5.9%
Missing50
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean236.7215855
Minimum107
Maximum696
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:24.885867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum107
5-th percentile170
Q1206
median234
Q3263
95-th percentile312
Maximum696
Range589
Interquartile range (IQR)57

Descriptive statistics

Standard deviation44.59033432
Coefficient of variation (CV)0.1883661527
Kurtosis4.131581824
Mean236.7215855
Median Absolute Deviation (MAD)29
Skewness0.8714220097
Sum991390
Variance1988.297915
MonotonicityNot monotonic
2022-10-11T22:48:24.957594image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24085
 
2.0%
22070
 
1.7%
26062
 
1.5%
21061
 
1.4%
23259
 
1.4%
25057
 
1.3%
20056
 
1.3%
22554
 
1.3%
23054
 
1.3%
20553
 
1.3%
Other values (238)3577
84.4%
(Missing)50
 
1.2%
ValueCountFrequency (%)
1071
< 0.1%
1131
< 0.1%
1191
< 0.1%
1241
< 0.1%
1261
< 0.1%
1291
< 0.1%
1331
< 0.1%
1352
< 0.1%
1371
< 0.1%
1402
< 0.1%
ValueCountFrequency (%)
6961
 
< 0.1%
6001
 
< 0.1%
4641
 
< 0.1%
4531
 
< 0.1%
4391
 
< 0.1%
4321
 
< 0.1%
4103
0.1%
4051
 
< 0.1%
3981
 
< 0.1%
3921
 
< 0.1%

sysBP
Real number (ℝ≥0)

HIGH CORRELATION

Distinct234
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.3524068
Minimum83.5
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:25.030171image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum83.5
5-th percentile104
Q1117
median128
Q3144
95-th percentile175
Maximum295
Range211.5
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.03809664
Coefficient of variation (CV)0.1665107358
Kurtosis2.155019383
Mean132.3524068
Median Absolute Deviation (MAD)13
Skewness1.145362136
Sum560909.5
Variance485.6777037
MonotonicityNot monotonic
2022-10-11T22:48:25.100543image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120107
 
2.5%
130102
 
2.4%
11096
 
2.3%
11589
 
2.1%
12588
 
2.1%
12484
 
2.0%
12280
 
1.9%
12673
 
1.7%
12873
 
1.7%
12372
 
1.7%
Other values (224)3374
79.6%
ValueCountFrequency (%)
83.52
 
< 0.1%
851
 
< 0.1%
85.51
 
< 0.1%
902
 
< 0.1%
921
 
< 0.1%
92.52
 
< 0.1%
932
 
< 0.1%
93.52
 
< 0.1%
943
0.1%
957
0.2%
ValueCountFrequency (%)
2951
 
< 0.1%
2481
 
< 0.1%
2441
 
< 0.1%
2431
 
< 0.1%
2351
 
< 0.1%
2321
 
< 0.1%
2301
 
< 0.1%
2202
< 0.1%
2171
 
< 0.1%
2153
0.1%

diaBP
Real number (ℝ≥0)

HIGH CORRELATION

Distinct146
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.8934639
Minimum48
Maximum142.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:25.303597image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q389.875
95-th percentile104.575
Maximum142.5
Range94.5
Interquartile range (IQR)14.875

Descriptive statistics

Standard deviation11.9108496
Coefficient of variation (CV)0.1436886461
Kurtosis1.277099606
Mean82.8934639
Median Absolute Deviation (MAD)7.5
Skewness0.714102184
Sum351302.5
Variance141.8683382
MonotonicityNot monotonic
2022-10-11T22:48:25.376889image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80262
 
6.2%
82152
 
3.6%
85137
 
3.2%
70135
 
3.2%
81131
 
3.1%
84122
 
2.9%
90119
 
2.8%
78116
 
2.7%
87113
 
2.7%
75108
 
2.5%
Other values (136)2843
67.1%
ValueCountFrequency (%)
481
 
< 0.1%
501
 
< 0.1%
511
 
< 0.1%
522
 
< 0.1%
531
 
< 0.1%
541
 
< 0.1%
553
0.1%
562
 
< 0.1%
576
0.1%
57.53
0.1%
ValueCountFrequency (%)
142.51
 
< 0.1%
1401
 
< 0.1%
1362
 
< 0.1%
1352
 
< 0.1%
1332
 
< 0.1%
1321
 
< 0.1%
1305
0.1%
1291
 
< 0.1%
1281
 
< 0.1%
127.51
 
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1363
Distinct (%)32.3%
Missing19
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean25.80200758
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:25.447832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.07
median25.4
Q328.04
95-th percentile32.782
Maximum56.8
Range41.26
Interquartile range (IQR)4.97

Descriptive statistics

Standard deviation4.080111062
Coefficient of variation (CV)0.1581315349
Kurtosis2.656838673
Mean25.80200758
Median Absolute Deviation (MAD)2.49
Skewness0.9819743064
Sum108858.67
Variance16.64730628
MonotonicityNot monotonic
2022-10-11T22:48:25.519262image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22.9118
 
0.4%
22.5418
 
0.4%
23.4818
 
0.4%
22.1918
 
0.4%
23.0916
 
0.4%
25.0916
 
0.4%
23.113
 
0.3%
22.7313
 
0.3%
25.2313
 
0.3%
27.7812
 
0.3%
Other values (1353)4064
95.9%
(Missing)19
 
0.4%
ValueCountFrequency (%)
15.541
< 0.1%
15.961
< 0.1%
16.481
< 0.1%
16.592
< 0.1%
16.611
< 0.1%
16.691
< 0.1%
16.711
< 0.1%
16.731
< 0.1%
16.751
< 0.1%
16.871
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
51.281
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%
44.551
< 0.1%
44.271
< 0.1%
44.091
< 0.1%
43.691
< 0.1%
43.671
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct73
Distinct (%)1.7%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean75.87892377
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:25.589451image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile60
Q168
median75
Q383
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation12.02659635
Coefficient of variation (CV)0.158497192
Kurtosis0.9074832435
Mean75.87892377
Median Absolute Deviation (MAD)7
Skewness0.6444817335
Sum321499
Variance144.6390198
MonotonicityNot monotonic
2022-10-11T22:48:25.663790image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75563
 
13.3%
80385
 
9.1%
70305
 
7.2%
60231
 
5.5%
85227
 
5.4%
72222
 
5.2%
65197
 
4.6%
90172
 
4.1%
68151
 
3.6%
10098
 
2.3%
Other values (63)1686
39.8%
ValueCountFrequency (%)
441
 
< 0.1%
452
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
485
 
0.1%
5022
0.5%
511
 
< 0.1%
5217
0.4%
5311
0.3%
5412
0.3%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1301
 
< 0.1%
1253
 
0.1%
1222
 
< 0.1%
1207
 
0.2%
1155
 
0.1%
1123
 
0.1%
11036
0.8%
1088
 
0.2%

glucose
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct143
Distinct (%)3.7%
Missing388
Missing (%)9.2%
Infinite0
Infinite (%)0.0%
Mean81.96675325
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-10-11T22:48:25.735016image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q171
median78
Q387
95-th percentile108.55
Maximum394
Range354
Interquartile range (IQR)16

Descriptive statistics

Standard deviation23.95999819
Coefficient of variation (CV)0.2923136179
Kurtosis58.67427779
Mean81.96675325
Median Absolute Deviation (MAD)8
Skewness6.213401854
Sum315572
Variance574.0815132
MonotonicityNot monotonic
2022-10-11T22:48:25.807490image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75193
 
4.6%
77167
 
3.9%
73156
 
3.7%
80152
 
3.6%
70152
 
3.6%
83151
 
3.6%
78148
 
3.5%
74141
 
3.3%
85127
 
3.0%
76127
 
3.0%
Other values (133)2336
55.1%
(Missing)388
 
9.2%
ValueCountFrequency (%)
402
 
< 0.1%
431
 
< 0.1%
442
 
< 0.1%
454
0.1%
473
0.1%
481
 
< 0.1%
503
0.1%
522
 
< 0.1%
535
0.1%
545
0.1%
ValueCountFrequency (%)
3942
< 0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%
3321
< 0.1%
3251
< 0.1%
3201
< 0.1%
2971
< 0.1%
2941
< 0.1%

TenYearCHD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size240.2 KiB
0
3594 
1
644 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4,238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Length

2022-10-11T22:48:25.870022image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T22:48:25.928781image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring characters

ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Interactions

2022-10-11T22:48:22.831340image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.598929image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.104311image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.561422image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.093625image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.528137image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.963305image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.389547image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.890221image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.707485image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.163177image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.619782image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.149374image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.584024image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.018320image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.446565image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:23.052749image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.766412image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.222151image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.678154image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.205791image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.640488image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.073790image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.503755image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:23.109510image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.824208image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.280117image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.736902image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.260937image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.696011image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.127867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.559739image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:23.164766image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.879725image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.335730image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.791175image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.313504image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.748846image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.179455image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.613552image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:23.220352image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.935100image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.391198image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.845959image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.366282image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.801496image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.231020image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.667282image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:23.274338image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:19.989277image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.445893image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.899292image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.418324image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.853211image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.281233image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.719985image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:23.330884image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.045799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.502624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:20.955266image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.472317image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:21.907100image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.334330image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-11T22:48:22.774379image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-10-11T22:48:25.983245image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-11T22:48:26.072093image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-11T22:48:26.160355image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-11T22:48:26.241225image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-11T22:48:26.316722image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-11T22:48:23.432010image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-11T22:48:23.542416image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-11T22:48:23.644174image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-10-11T22:48:23.699863image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

maleageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
01394.000.00.0000195.0106.070.026.9780.077.00
10462.000.00.0000250.0121.081.028.7395.076.00
21481.0120.00.0000245.0127.580.025.3475.070.00
30613.0130.00.0010225.0150.095.028.5865.0103.01
40463.0123.00.0000285.0130.084.023.1085.085.00
50432.000.00.0010228.0180.0110.030.3077.099.00
60631.000.00.0000205.0138.071.033.1160.085.01
70452.0120.00.0000313.0100.071.021.6879.078.00
81521.000.00.0010260.0141.589.026.3676.079.00
91431.0130.00.0010225.0162.0107.023.6193.088.00

Last rows

maleageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
42280501.000.00.0011260.0190.0130.043.6785.0260.00
42290513.0120.00.0010251.0140.080.025.6075.0NaN0
42300561.013.00.0010268.0170.0102.022.8957.0NaN0
42311583.000.00.0010187.0141.081.024.9680.081.00
42321681.000.00.0010176.0168.097.023.1460.079.01
42331501.011.00.0010313.0179.092.025.9766.086.01
42341513.0143.00.0000207.0126.580.019.7165.068.00
42350482.0120.0NaN000248.0131.072.022.0084.086.00
42360441.0115.00.0000210.0126.587.019.1686.0NaN0
42370522.000.00.0000269.0133.583.021.4780.0107.00